Overview

Red Hat OpenShift Container Platform is a Platform as a Service (PaaS) that provides developers and IT organizations with a cloud application platform for deploying new applications on secure, scalable resources with minimal configuration and management overhead. OpenShift Container Platform supports a wide selection of programming languages and frameworks, such as Java, Ruby, and PHP.

Built on Red Hat Enterprise Linux and Google Kubernetes, OpenShift Container Platform provides a secure and scalable multi-tenant operating system for today’s enterprise-class applications, while providing integrated application runtimes and libraries. OpenShift Container Platform brings the OpenShift PaaS platform to customer data centers, enabling organizations to implement a private PaaS that meets security, privacy, compliance, and governance requirements.

About This Release

Red Hat OpenShift Container Platform version 3.7 (RHSA-2017:3188) is now available. This release is based on OpenShift Origin 3.7. New features, changes, bug fixes, and known issues that pertain to OpenShift Container Platform 3.7 are included in this topic.

OpenShift Container Platform 3.7 is supported on RHEL 7.3, 7.4.2, and Atomic Host 7.4.2 and newer with the latest packages from Extras, including Docker 1.12.

For initial installations, see the Installing a Cluster topics in the Installation and Configuration documentation.

To upgrade to this release from a previous version, see the Upgrading a Cluster topics in the Installation and Configuration documentation.

New Features and Enhancements

This release adds improvements related to the following components and concepts.

Container Orchestration

Kubernetes Upstream

Many core features Google announced in June for Kubernetes 1.7 were the result of OpenShift engineering. Red Hat continues to influence the product in the areas of storage, networking, resource management, authentication and authorization, multi-tenancy, security, service deployments, templating, and controller functionality.

CRI-O (Technology Preview)

This feature is currently in Technology Preview and not for production workloads. CRI-O with builds will not yet work.

CRI-O v1.0 is a lightweight, native Kubernetes container runtime interface. By design, it provides only the runtime capabilities needed by the kubelet. CRI-O is designed to be part of Kubernetes and evolve in lock-step with the platform.

CRI-O brings:

  • A minimal and secure architecture.

  • Excellent scale and performance.

  • The ability to run any Open Container Initiative (OCI) or docker image.

  • Familiar operational tooling and commands.

CRI-O

To install and run CRI-O alongside docker, set the following in the [OSEv3:vars] section Ansible inventory file during cluster installation:

openshift_use_crio=true

This setting pulls the openshift3/cri-o system container image from the Red Hat Registry by default. If you want to use an alternative CRI-O system container image from another registry, you can also override the default using the following variable:

openshift_crio_systemcontainer_image_override=<registry>/<repo>/<image>:<tag>

The atomic-openshift-node service must be RPM- or system container-based when using CRI-O; it cannot be docker container-based. The installer protects again using CRI-O with docker container nodes and will halt installation if detected.

When CRI-O use is enabled, it is installed alongside docker, which currently is required to perform build and push operations to the reigstry. Over time, temporary docker builds can accumulate on nodes. You can optionally set the following to enable garbage collection, which adds a daemonset to clean out the builds:

openshift_crio_enable_docker_gc=true

When enabled, it will run garbage collection on all nodes by default. You can also limit the running of the daemonset on specific nodes by setting the following:

openshift_crio_docker_gc_node_selector={'runtime': 'cri-o'}

For example, the above would ensure it is only run on nodes with the runtime: cri-o label. This can be helpful if you are running CRI-O only on some nodes, and others are only running docker.

See the upstream documentation for more information on CRI-O.

Cluster-wide Tolerations and Per-namespace Tolerations to Control Pod Placement

In a multi-tenant environment, you want to leverage administration controllers to help define rules that can help govern a cluster, should a tenant not set a toleration for placement.

The following is offered to administrators where the namespace setting will override the cluster setting:

  • Cluster-wide and per-namespace default toleration for pods.

  • Cluster-wide and per-namespace white-listing of toleration for pods.

Cluster-wide Off Example
admissionConfig:
  pluginConfig:
    PodTolerationRestriction:
      configuration:
        kind: DefaultAdmissionConfig
        apiVersion: v1
        disable: true
Cluster-wide On Example
admissionConfig:
  pluginConfig:
    PodTolerationRestriction:
      configuration:
        apiVersion: podtolerationrestriction.admission.k8s.io/v1alpha1
        kind: Configuration
        default:
         - key: key3
           value: value3
        whitelist:
         - key: key1
           value: value1
         - key: key3
           value: value3
Namespace-specific Example
apiVersion: v1
kind: Namespace
metadata:
  annotations:
    openshift.io/description: ""
    openshift.io/display-name: ""
    openshift.io/sa.scc.mcs: s0:c8,c7
    openshift.io/sa.scc.supplemental-groups: 1000070000/10000
    openshift.io/sa.scc.uid-range: 1000070000/10000
    scheduler.alpha.kubernetes.io/defaultTolerations: '[ { "key": "key1", "value":"value1" }]'
    scheduler.alpha.kubernetes.io/tolerationsWhitelist: '[ { "key": "key1", "value":
      "value1" }, { "key": "key2", "value": "value2" } ]'
  generateName: dma-
spec:
  finalizers:
  - openshift.io/origin
  - kubernetes

Security

Documented Private and Public Key Configurations and Crypto Levels

While OpenShift Container Platform is a secured by default implementation of Kubernetes, there is now documentation on what security protocols and ciphers are used.

OpenShift Container Platform leverages Transport Layer Security (TLS) cipher suites, JSON Web Algorithms (JWA) crypto algorithms, and offers external libraries such as The Generic Security Service Application Program Interface (GSSAPI) and libgpgme.

Private and public key configurations and Crypto levels are now documented for OpenShift Container Platform.

Node Authorizer and Node Restriction Admission Plug-in

Pods can no longer try to gain information from secrets, configuration maps, PV, PVC, or API objects from other nodes.

Node authorizer governs what APIs a kubelet can perform. Spanning read-, write-, and auth-related operations. In order for the admission controller to know the identity of the node to enforce the rules, nodes are provisioned with credentials that identify them with the user name system:node:<nodename> and group system:nodes.

These enforcements are in place by default on all new installations of OpenShift Container Platform 3.7. For upgrades from OpenShift Container Platform 3.6, they are not in place due to the system:nodes RBAC being granted from OCP 3.6. To turn the enforcements on, run:

# oadm policy remove-cluster-role-from-group system:node system:nodes

Advanced Auditing

With Advanced Auditing, administrators are now exposed to more information from the API call within the audit trail. This provides a deeper traceability of what is occurring across the cluster. We also capture all login events at the default logging level and modifications to role binds and SCC.

OpenShift Container Platform now has an audit policyFile or policyConfiguration where administrators can filter in on what they want to capture.

See Advanced Audit for more information.

Complete Upstreaming of RBAC, Then Downstreaming it Back into OpenShift

The rolebinding and RBAC experience is now the same across all Kubernetes distributions.

Administrators do not have to do anything for this migration to occur. The upgrade process to OpenShift Container Platform 3.7 offers a seamless experience. Now, the user experience is consistent with upstream.

A role can be defined within a namespace with a Role, or cluster-wide with a ClusterRole.

A RoleBinding or ClusterRoleBinding binds a role to subjects. Subjects can be groups, users, or service accounts. A role binding grants the permissions defined in a role.

Issue Longer-lived API Tokens to OAuth Clients

Administrators now have the ability to set different token timeouts for the different ways users connect to OpenShift Container Platform (for example, via the oc command line, from a GitHub authentication, or from the web console).

Administrators can edit oauthclients and set the accessTokenMaxAgeSeconds to a time value in seconds that meets their needs.

There are three possible OAuth client types:

  1. openshift-web-console - The client used to request tokens for the OpenShift web console.

  2. openshift-browser-client - The client used to request tokens at /oauth/token/request with a user-agent that can handle interactive logins, such as using Auth from GitHub, Google Authenticator, and so on.

  3. openshift-challenging-client - The client used to request tokens with a user-agent that can handle WWW-Authenticate challenges, such as the oc command line.

    • When accessTokenMaxAgeSeconds is set to 0, tokens do not expire.

    • When left blank, OpenShift Container Platform uses the definition in master-config.

    • Edit the client of interest via:

      # oc edit oauthclients openshift-browser-client
    • Set accessTokenMaxAgeSeconds to 600.

    • Check the setting via:

      # oc get oauthaccesstoken

See Other API Objects for more information.

Security Context Constraints Now Supports flexVolume

flexVolumes allow users to integrate with new APIs easily by being able to mount in the items needed for integration. For example, the ability to bind mount in certain files without overwriting whole directories to integrate with Kerberos.

Administrators are now able to grant access to users to use specific flexVolume driver names. Previously, the only way administrators could restrict flexVolumes was by setting them as on or off.

Storage

Local Storage Persistent Volumes (Technology Preview)

Local storage persistent volumes is a feature currently in Technology Preview and not for production workloads.

Local persistent volumes (PVs) now offer the ability to allow tenants to request storage that is local to a node through the regular persistent volume claim (PVC) process without needing to know the node. Local storage is commonly used in data store applications.

The administrator needs to create the local storage on the nodes, mount them under directories, and then manually create the persistent volume (PV). Alternatively, they can use an external provisioner and feed it the node configuration via configMaps.

Example persistent volume named example-local-pv that some tenants can now claim:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: example-local-pv
  annotations:
    "volume.alpha.kubernetes.io/node-affinity": '{
      "requiredDuringSchedulingIgnoredDuringExecution": {
        "nodeSelectorTerms": [
          { "matchExpressions": [
            { "key": "kubernetes.io/hostname",
              "operator": "In",
              "values": ["my-node"]
            }
          ]}
         ]}
        }'
spec:
  capacity:
    storage: 5Gi
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-storage
  local:
    path: /mnt/disks/vol1

Tenant-driven Storage Snapshotting (Technology Preview)

Tenant-driven storage snapshotting is currently in Technology Preview and not for production workloads.

Tenants now have the ability to leverage the underlying storage technology backing the persistent volume (PV) assigned to them to make a snapshot of their application data. Tenants can also now restore a given snapshot from the past to their current application.

An external provisioner is used to access the EBS, GCE pDisk, and HostPath, and Cinder snapshotting API. This Technology Preview feature has tested EBS and HostPath. The tenant must stop the pods and start them manually.

  1. The administrator runs an external provisioner for the cluster. These are images from the Red hat Container Catalog.

  2. The tenant made a PVC and owns a PV from one of the supported storage solutions.The administrator must create a new StorageClass in the cluster with:

    kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
      name: snapshot-promoter
    provisioner: volumesnapshot.external-storage.k8s.io/snapshot-promoter
  3. The tenant can create a snapshot of a PVC named gce-pvc and the resulting snapshot will be called snapshot-demo.

    $ oc create -f snapshot.yaml
    
    apiVersion: volumesnapshot.external-storage.k8s.io/v1
    kind: VolumeSnapshot
    metadata:
      name: snapshot-demo
      namespace: myns
    spec:
      persistentVolumeClaimName: gce-pvc
  4. Now, they can restore their pod to that snapshot.

    $ oc create -f restore.yaml
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: snapshot-pv-provisioning-demo
      annotations:
        snapshot.alpha.kubernetes.io/snapshot: snapshot-demo
    spec:
      storageClassName: snapshot-promoter

Storage Classes Get Zones

Public clouds are particular about not allowing storage to cross zones or regions, so tenants need an ability at times to specify a particular zone.

In OpenShift Container Platform 3.7, administrators can now leverage a zone’s definition within the StorageClass:

kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
  name: slow
provisioner: kubernetes.io/<provisioner>
parameters:
  type: pd-standard
  zones: zone1,zone2

Increased Persistent Volume Density Support by CNS

Container-native storage (CNS) on OpenShift Container Platform 3.7 now supports much higher persistent volume density (three times more) to support a large number of applications at scale. This is due to the introduction of brick-multiplexing support in GlusterFS.

Over 1,000 volumes in a 3-node cluster with 32 GB of RAM per node available to GlusterFS has been successfully tested. Also, 300 Block PVs are supported now on 3-node CNS.

CNS Multi-protocol (File, Block, and S3) Support for OpenShift

Container-native storage (CNS) is now extended support iSCSI and S3 back end for OpenShift Container Platform. Heketi is enhanced to support persistent volume (PV) expansion, volume option, and HA.

Block device-based RWO implementation is added to CNS to improve the performance of ElasticSearch, PostgreSQL, and so on. With OpenShift Container Platform 3.7, Elastic and Cassandra are fully supported.

CNS Full Support for Infrastructure Services

Container-native storage (CNS) now fully supports all OpenShift Container Platform infrastructure services: registry, logging, and metrics.

OpenShift Container Platform logging (with Elasticsearch) and OpenShift Container Platform metrics (with Cassandra) are fully supported on persistent volumes backed by CNS/CRS iSCSI block storage.

The OpenShift Container Platform registry is hosted on CNS/CRS by RWX persistent volumes, providing high availability and redundancy through Gluster architecture.

Logging and metrics were tested at scale with 1000+ pods.

Automated Container Native Storage Deployment with OpenShift Advanced Installation

OpenShift Container Platform 3.7 now includes an integrated and simplified installation of container-native storage (CNS) through the advanced installer. The advanced installer is enhanced for automated and integrated support for deployment of CNS including block provisioner, S3 provisioner, and files for correctly configured out-of-the-box OpenShift Container Platform and CNS. The CNS storage device details are added to the installer’s inventory file. The installer manages configuration and deployment of CNS, its dynamic provisioners, and other pertinent details.

Official FlexVolume Support for Non-storage Use Cases

There is now a supported interface to allow you to bind and mount in content from a running pod. FlexVolume is a script interface that runs on the kubelet and offers five main functions to help you mount in content such as device drivers, secrets, and certificates as bind mounts to the container from the host:

  • init - Initialize the volume driver.

  • attach - Attach the volume to the host.

  • mount - Mount the volume on the host. This is the part that makes the volume available to the host to mount it in /var/lib/kubelet.

  • unmount - Unmount the volume.

  • detach - Detach the volume from the host.

Scale

Cluster Limits

Updated guidance around Cluster Limits for OpenShift Container Platform 3.7 is now available.

Updated Tuned Profile Hierarchy

The Tuned Profile Hierarchy is updated as of 3.7.

Cluster Loader

Guidance regarding use of Cluster Loader is now available with the release of OpenShift Container Platform 3.7. Cluster Loader is a tool that deploys large numbers of various objects to a cluster, which creates user-defined cluster objects. Build, configure, and run Cluster Loader to measure performance metrics of your OpenShift Container Platform deployment at various cluster states.

Guidance on Overlay Graph Driver with SELinux

In OpenShift Container Platform 3.7, guidance about the benefits of using the Overlay Graph Driver with SELinux is now available.

Providing Storage to an etcd Node Using PCI Passthrough with OpenStack

Networking

Network Policy

Network Policy is now fully supported in OpenShift Container Platform 3.7.

Network Policy is an optional plug-in specification of how selections of pods are allowed to communicate with each other and other network endpoints. It provides fine-grained network namespace isolation using labels and port specifications.

After installing the Network Policy plug-in, an annotation that flips the namespace from allow all traffic to deny all traffic must first be set on the namespace. At that point, NetworkPolicies can be created that define what traffic to allow. The annotation is as follows:

$ oc annotate namespace ${ns} 'net.beta.kubernetes.io/network-policy={"ingress":{"isolation":"DefaultDeny"}}'

The annotation is not needed when using the v1 API.

The allow-to-red policy specifies "all red pods in namespace project-a allow traffic from any pods in any namespace." This does not apply to the red pod in namespace project-b because podSelector only applies to the namespace in which it was applied.

Policy applied to project
kind: NetworkPolicy
apiVersion: extensions/v1beta1
metadata:
  name: allow-to-red
spec:
  podSelector:
    matchLabels:
      type: red
  ingress:
  - {}

See Managing Networking for more information.

Cluster IP Range Now More Flexible

Cluster IP ranges are now more flexible by allowing multiple subnets for hosts. This provides the capability to allocate multiple, smaller IP address ranges for the cluster. This makes it easier to migrate from one allocated IP range to another.

There are multiple comma-delimited CIDRs in the configuration file. Each node is allocated only a single subnet from within any of the available ranges. You can not allocate different-sized host subnets, or use this to change the host subnet size. The clusterNetworkCIDRs can be different sizes, but must be equal to or larger than the host subnet size. It is not allowed to have some nodes use subnets that are not part of the clusterNetworkCIDRs. Nodes can allocate different-sized subnets by setting different hostSubnetLength values.

In regard to migration or edits, networks can be added to the list, CIDRs in the list may be re-ordered, and a CIDR can be removed from the list when there are no nodes that have an SDN allocation from that CIDR.

Example:

networkConfig:
  clusterNetworkCIDR: 10.128.0.0/24
  clusterNetworks:
  - cidr: 11.128.0.0/24
    hostSubnetLength: 6
  - cidr: 12.128.0.0/24
    hostSubnetLength: 6
  - cidr: 13.128.0.0/24
    hostSubnetLength: 4
  externalIPNetworkCIDRs:
  - 0.0.0.0/0
  hostSubnetLength: 6

The HAProxy router can look for a cookie in a client request. Based on that cookie name and value, always route requests that have that cookie to the same pod instead of relying upon the client source IP, which can be obscured by an F5 doing load balancing.

A cookie with a unique name is used to handle session persistence.

  1. Set a per-route configuration to set the cookie name used for the session.

  2. Add an env to set a router-wide default.

  3. Ensure that the cookie is set and honored by the router to control access.

Example scenario:

  1. Set a default cookie name for the HAProxy router:

    $ oc env dc/router ROUTER_COOKIE_NAME=default-cookie
  2. Log in as a normal user and create the project/pod/svc/route:

    $ oc login user1
    $ oc new-project project1
    $ oc create -f https://example.com/myhttpd.json
    $ oc create -f https://example.com/service_unsecure.json
    $ oc expose service service-unsecure
  3. Access the route:

    $ curl $route -v

    The HTTP response will contain the cookie name. For example:

    Set-Cookie: default_cookie=[a-z0-9]+
  4. Modify the cookie name using route annotation:

    $ oc annotate route service-unsecure router.openshift.io/cookie_name="route-cookie"
  5. Re-access the route:

    $ curl $route -v

    The HTTP response will contain the new cookie name:

    Set-Cookie: route-cookie=[a-z0-9]+

See Route-specific Annotations for more information.

HSTS Policy Support

HTTP Strict Transport Security (HSTS) ensures all communication between the server and client is encrypted and that all sent and received responses are delivered to and received from the authenticated server.

An HSTS policy is provided to the client via an HTTPS header (HSTS headers over HTTP are ignored) using an haproxy.router.openshift.io/hsts_header annotation to the route. When the Strict-Transport-Security response in the header is received by a client, it observes the policy until it is updated by another response from the host, or it times-out (max-age=0).

Example using reencrypt route:

  1. Create the pod/svc/route:

    $ oc create -f https://example.com/test.yaml
  2. Set the Strict-Transport-Security header:

    $ oc annotate route serving-cert haproxy.router.openshift.io/hsts_header="max-age=300;includeSubDomains;preload"
  3. Access the route using https:

    $ curl --head https://$route -k
    
       ...
       Strict-Transport-Security: max-age=300;includeSubDomains;preload
       ...

Enabling Static IPs for External Project Traffic (Technology Preview)

As a cluster administrator, you can assign specific, static IP addresses to projects, so that traffic is externally easily recognizable. This is different from the default egress router, which is used to send traffic to specific destinations.

Recognizable IP traffic increases cluster security by ensuring the origin is visible. Once enabled, all outgoing external connections from the specified project will share the same, fixed source IP, meaning that any external resources can recognize the traffic.

Unlike the egress router, this is subject to EgressNetworkPolicy firewall rules.

See Managing Networking for more information.

Master

Public Pull URL Provided for Images

A public pull URL is provided for images versus being able to know the internal in-cluster IP or DNS of the service.

A new API field for the image stream with the public URL of the image was added, and a public URL is configured in the master-config.yaml file. The web console will understand this new field and generate the public pull specifications automatically to users (so users can just copy and paste the pull URL).

Example:

  1. Check the internalRegistryHostname setting in the master-config.yaml file:

      ...
      imagePolicyConfig:
        internalRegistryHostname: docker-registry.default.svc:5000
      ...
  2. Delete the OPENSHIFT_DEFAULT_REGISTRY variable in both:

    /etc/sysconfig/atomic-openshift-master-api
    /etc/sysconfig/atomic-openshift-master-controllers
  3. Start a build and check the push URL. It should push the new build image with internalRegistryHostname to the docker-registry.

Custom Resource Definitions

A resource is an endpoint in the Kubernetes API that stores a collection of API objects of a certain kind (for example, pod objects). A custom resource definition is a built-in API that enables the ability to plug in your own custom, managed object and application as if it were native to Kubernetes. Therefore, you can leverage Kubernetes cluster management, RBAC and authentication services, PI services, CLI, security, and so on, without having to know Kubernetes internals or modifying Kubernetes itself in any way.

Custom Resource Definitions (CRD) deprecates Third Party Resources in Kubernetes 1.7.

How it works:

  1. Define a CRD class (your custom objects) and register the new resource type. This defines how it fits into the hierarchy and how it will be referenced from the CLI and API.

  2. Define a function to create a custom client, which is aware of the new resource schema.

  3. Once completed, it can be accessed from the CLI. However, in order to build controllers or custom functionality, you need API access to the objects, and so you need to build a set of CRUD functions (library) to access the objects and the event-driven listener for controllers.

  4. Create a client that:

    • Connects to the Kubernetes cluster.

    • Creates the new CRD (if it does not exist).

    • Creates a new custom client.

    • Creates a new test object using the client library.

    • Creates a controller that listens to events associated with new resources.

API Aggregation

There is now Kubernetes documentation on how API aggregation works in OpenShift Container Platform 3.7 and how other users can add third-party APIs:

Master Prometheus Endpoint Coverage

Prometheus endpoint logic was added to upstream components so that monitoring and health indicators can be added around deployment configurations.

Installation

Migrate etcd Before OpenShift Container Platform 3.7 Upgrade

Starting in OpenShift Container Platform 3.7, the use of the etcd3 v3 data model is required.

OpenShift Container Platform gains performance improvements with the v3 data model. In order to upgrade the data model, an embedded etcd configuration option in no longer allowed. Embedded is not co-located and mainly used in single-master deployments. Migration scripts will convert the v3 data model and allow you to move an embedded etcd to an external etcd either on the same host or a different host than the masters. In addition, there is a new scale up ability for etcd clusters.

See Migrating Embedded etcd to External etcd for more information.

Modular Installer to Allow Playbooks to Run Independently

The installer has been enhanced to allow administrators to install specific components. By breaking up the roles and playbooks, there is better targeting of ad hoc administration tasks.

New Installation Experience Around Phases

When you run the installer, OpenShift Container Platform now reports back at the end what phases you have gone through.

If the installation fails during a phase, you will be notified on the screen along with the errors from the Ansible run. Once you resolve the issue, rather than run the entire installation over again, you can pick up from the failed phase. This results in an increased level of control during installations and results in time savings.

Increased Control Over Image Stream and Templates

With OpenShift Container Platform 3.7, there is added control over whether or not your cluster automatically upgrades all the content provided during cluster upgrades.

Edit the openshift_install_examples variable in the hosted file or set it as a variable in the installer.

RPM = /etc/origin/examples /etc/origin/hosted
Container = /usr/share/openshift/examples /usr/share/openshift/hosted

openshift_install_examples=false

Setting openshift_install_examples to false will cause the installer to not upgrade the imagestream and templates. True is the default behavior.

Installation and Configuration of CFME 4.6 from the OpenShift Installer

Red Hat CloudForms Management Engine (CFME) 4.6 is now fully supported running on OpenShift Container Platform 3.7 as a set of containers.

CFME 4.6 is not yet released. Until it is available, this role is limited to installing ManageIQ (MIQ), the open source project that CFME is based on. The following is provided mainly for informational purposes. The OpenShift Container Platform 3.7 documentation will be updated with more complete instructions on deploying CFME 4.6 after it has been released.

CFME is an available API endpoint on all OpenShift Container Platform clusters that choose to use it. More cluster administrators are now able to leverage CFME and begin experiencing the insight and automations available to them in OpenShift Container Platform.

To install CFME 4.6:

# ansible-playbook -v -i <YOUR_INVENTORY> \
    playbooks/byo/openshift-management/config.yml

There is a known issue with this playbook.

To configure CFME 4.6 to consume the OpenShift Container Platform installation it is running on:

# ansible-playbook -v -i <YOUR_INVENTORY> \
    playbooks/byo/openshift-management/add_container_provider.yml

You can also automate the configuration of the provider to point to multiple OpenShift clusters:

# ansible-playbook -v -e container_providers_config=/tmp/cp.yml \
    playbooks/byo/openshift-management/add_many_container_providers.yml

The /tmp/cp.yml file requires some manual configurations to create and use it correctly.

See Multiple Container Providers for more information.

Diagnostics

Additional Health Checks

More health checks are now available for administrators to run after installations and upgrades. Administrators need the ability to run tests periodically to help determine the health of the framework components within the cluster. OpenShift Container Platform 3.7 offers test functionality via Ansible playbooks that can be run and output can be sent as file-based output.

$ ansible-playbook playbooks/byo/openshift-checks/adhoc.yml
                curator
                diagnostics
                disk_availability
                docker_image_availability
                docker_storage
                elasticsearch
                etcd_imagedata_size
                etcd_traffic
                etcd_volume
                fluentd
                fluentd_config
                kibana
                logging
                logging_index_time
                memory_availability
                ovs_version
                package_availability
                package_update
                package_version

$ ansible-playbook playbooks/byo/openshift-checks/adhoc.yml -e
openshift_checks=fluentd_config,logging_index_time,docker_storage

Alternatively, they are included in the health playbook:

$ ansible-playbook playbooks/byo/openshift-checks/health.yml

To capture the output:

$ ansible-playbook playbooks/byo/openshift-checks/health.yml -e
openshift_checks_output_dir=/tmp/checks

Metrics and Logging

Journald for System Logs and JSON File for Container Logs

Docker log driver is set to json-file as the default for all nodes. Docker log-driver can be set to journald, but there is no log rate throttling with the journald driver. So, there is always a risk for denial-of-service attacks from rogue containers.

Fluentd will automatically determine which log driver (journald or json-file) the container runtime is using. Fluentd will now always read logs from journald and also /var/log/containers (if log-driver is set to json-file). Fluentd will no longer read from /var/log/messages.

See Aggregating Container Logs for more information.

Docker Events and API Calls Aggregated to EFK as Logs

Fluentd captures standard error and standard out from the running containers on the node. With this change, fluentd collects all the errors and events coming from the docker daemon running on the node and sends it to Elasticsearch (ES).

Enable this via the OpenShift Container Platform installer:

openshift_logging_fluentd_audit_container_engine=true

The collected information is in operation indices of ES and only cluster administrators have visual access. The event message includes action, pod name, image name, and user time-stamp.

Master Events are Aggregated to EFK as Logs

The eventrouter pod scrapes the events from kubernetes API and and outputs to STDOUT. The fluentd plug-in transforms the log message and sends it to Elasticsearch (ES).

Enable openshift_logging_install_eventrouter by setting it to true. It is off by default. Eventrouter is deployed to the default namespace. Collected information is in operation indices of ES and only cluster administrators have visual access.

See the design documentation for more information.

Kibana Dashboards for Operations Are Now Shareable

This allows OpenShift Container Platform administrators the ability to share saved Kibana searches, visualizations, and dashboards.

When openshift_logging_elasticsearch_kibana_index_mode is set to shared_ops, one admin user can create queries and visualizations for other admin users. Other users can not see those same queries and visualizations.

When openshift_logging_elasticsearch_kibana_index_mode is set to unique, users can only see saved queries and visualizations they created. This is the default behavior.

See Aggregating Container Logs for more information.

Removed ES_Copy Method for Sending Logs to External ES

ES_Copy was replaced with the secure_formard plug-in for fluentd to send logs from fluentd to external fluentd (that can then ingest into ES). ES_COPY is removed from the installer and the documentation.

When openshift_installer is run for logging to upgrade to 3.7, the installer now checks for ES_COPY in the inventory and fails the upgrade with:

msg: The ES_COPY feature is no longer supported. Please remove the variable from your inventory

See Aggregating Container Logs for more information.

Expose Elasticsearch as a Route

By default, Elasticsearch (ES) deployed with OpenShift aggregated logging is not accessible from outside the logging cluster. This enables a route for external access to ES for those tools that want to access its data.

You now have direct access to ES using only your OpenShift token and have the ability to provide the external ES and ES Ops hostnames when creating the server certificate (similar to Kibana). Ansible tasks now simplify route deployment.

Removed Metrics and Logging Deployers

The metrics and logging deployers bare now replaced with playbook2image for oc cluster up so that openshift-ansible is used to install logging and metrics:

$ oc cluster up --logging --metrics

Check metrics and pod status:

$ oc get pod -n openshift-infra
$ oc get pod -n logging

Prometheus (Technology Preview)

OpenShift Container Platform operators deploy Prometheus (currently in Technology Preview and not for production workloads) on a OpenShift Container Platform cluster, collect Kubernetes and infrastructure metrics, and get alerts. Operators can see and query metrics and alerts on the Prometheus web dashboard, or bring their own Grafana and hook it up to Prometheus.

See Prometheus on OpenShift for more information.

Integrated Approach to Adding Hawkular OpenShift Agent (Tecnhology Preview)

Hawkular OpenShift Agent (HOSA) remains in Technology Preview and not for production workloads. It is packaged and can now be installed with the openshift_metrics_install_hawkular_agent option in the installer by setting it to true.

See Enabling Cluster Metrics for more information.

Developer Experience

Template Instantiation API

Clients can now easily invoke a server API instead of relying on client logic.

See Template Instantiation for more information.

Metrics

OpenShift Container Platform now includes:

  • Prometheus metrics that show you the health of builds in the system (number running, failing, failure reasons, and so on).

  • Timing information on build objects themselves to show how long they spent in various steps (not exposed as Prometheus metrics).

CLI Plug-ins (Technology Preview)

CLI plug-ins are currently in Technology Preview and not for production workloads.

Usually called plug-ins or binary extensions, this feature allows you to extend the default set of oc commands available and, therefore, allows you to perform new tasks.

See Extending the CLI for information on how to install and write extensions for the CLI.

Chaining Builds

In OpenShift Container Platform 3.7, Chaining Builds is a better approach for producing runtime-only application images, and fully replaces the Extended Builds feature.

Benefits of Chaining Builds include:

  • Supported by both Docker and Source-to-Image (S2I) build strategies, as well as combinations of the two, compared with S2i strategy only for Extended Builds.

  • No need to create and manage a new assemble-runtime script.

  • Easy to layer application components into any thin runtime-specific image.

  • Can build the application artifacts image anywhere.

  • Better separation of concerns between the step that produces the application artifacts and the step that puts them into an application image.

Web Console

OpenShift Ansible Broker

In OpenShift Container Platform 3.7, Open Service Broker API is implemented, enabling users to leverage Ansible for provisioning and managing services from the Service Catalog. This is a standardized approach for delivering simple to complex multi-container OpenShift services via Ansible. It works in conjunction with Ansible Playbook Bundle (APB) for lightweight application definition. APBs can be used to deliver and orchestrate on-platform services, but could also be used to provision and orchestrate off-platform services (from cloud providers, IaaS, and so on).

OpenShift Ansible Broker supports production workloads and multiple service plans. There is now secure connectivity between Service Catalog and Service Broker.

You can interact with the Service Catalog to provision and manage services while the details of the broker remain largely hidden.

Ansible Playbook Bundles

Ansible Playbook Bundles (APBs) are short-lived, lightweight container image consisting of:

  • a simple directory structure with named action playbooks.

  • metadata (required and optional parameters, as well as dependencies).

  • an Ansible runtime environment.

Developer tooling is included, providing a guided approach to APB creation. There is also support for the test playbook, allowing for functional testing of the service.) Two new APBs are introduced for MariaDB (SCL) and MySQL DB (SCL).

When a user provisions an application from the Service Catalog, the Ansible Service Broker will download the associated APB image from the registry and run it.

Developing APBs can be done in one of two ways: Creating the APB container image manually using standardized container creation tooling, or with APB tooling that Red Hat will deliver, which provides a guided approach to creation.

OpenShift Template Broker

The OpenShift Template Broker exposes templates through a Open Service Broker API to the Service Catalog.

The Template Broker matches the lifecycles of provision, deprovision, bind, and unbind with existing templates. No changes are required to templates, unless you expose bind. Your application will get injected with configuration details.

Initial Experience

OpenShift Container Platform 3.7 provides a better initial user experience with the Service Catalog. This includes:

  • A task-focused interface

  • Key call-outs

  • Unified search

  • Streamlined navigation

The new user interface is designed to really streamline the getting started process, in addition to incorporating the new Service Catalog items. It shows the existing content (for example, builder images and templates) as well as catalog items (if the catalog is enabled).

The new user experience can be enabled as a Technology Preview feature without the Service Catalog to be active. A cluster with this user interface (UI) would still be supported. Running the catalog UI without the Service Catalog enabled will work, but access to templates without the catalog will require a few extra steps.

Search Catalog

OpenShift Container Platform 3.7 provides a simple way to quickly get what you want The new Search Catalog user interface is designed to make it much easier to find items in a number of ways, making it even faster to find the items you are wanting to deploy.

search catalog

Add from Catalog

Provision a service from the catalog. Select the desired service and follow prompts for the desired project and configuration details.

add to project

Connect a Service

Once a service is deployed, get coordinates to connect the application to it.

The broker returns a secret, which is stored in the project for use. You are guided through a process to update the deployment to inject a secret.

connect a service

Include Templates from Other Projects

Since templates are now served through a broker, there is now a way for you to deploy templates from other projects.

Upload the template, then select the template from a project.

Add to Project Options

Notifications

Key notifications are now under a single UI element, the notification drawer.

The bell icon is decorated when new notifications exist. You can mark all read, clear all, view all, or dismiss individual ones. Key notifications are represented with the level of information, warning, or error.

Notification drawer

Improved Quota Warnings

Quota notifications are now put in the notification drawer and are less intrusive.

quota warning

There are now separate notifications for each quota type instead of one generic warning. When at quota and not over quota, this is displayed as an informative message. Usage and maximum is displayed in the message. You can mark Don’t Show Me Again per quota type. Administrators can create custom messages to the quota warning.

Environment Variable Editor Added to the Stateful Sets Page

An environment variable editor is now added to the Stateful Sets page.

Stateful Sets Page

Support for the EnvFrom Construct

Anything with a pod template now supports the EnvFrom construct that lets you break down an entire configuration map or secret into environment variables without explicitly setting env name to key mappings.

Notable Technical Changes

OpenShift Container Platform 3.7 introduces the following notable technical changes.

API Connectivity Variables OPENSHIFT_MASTER and KUBERNETES_MASTER Are Now Deprecated

OpenShift Container Platform deployments using a custom strategy or hooks are provided with a container environment, which includes two variables for API connectivity:

  • OPENSHIFT_MASTER: A URL to the OpenShift API .

  • KUBERNETES_MASTER: A URL to the Kubernetes API exposed by OpenShift.

These variables are now deprecated, as they refer to internal endpoints rather than the published OpenShift API service endpoints. To connect to the OpenShift API in these contexts, use service DNS or the automatically exposed KUBERNETES service environment variables.

The OPENSHIFT_MASTER and KUBERNETES_MASTER environment variables are removed from deployment container environments as of OpenShift Container Platform 3.7.

openshift_hosted_{logging,metrics}_* Ansible Variables for the Installer Are Now Deprecated

The openshift_hosted_{logging,metrics}_* Ansible variables used by the installer have been deprecated. The installation documentation has been updated to use the newer variable names. The deprecated variable names are planned for removal in the next minor release of OpenShift Container Platform.

Removed generatedeploymentconfig API Endpoint

The generatedeploymentconfig API endpoint is now removed

A large number of policy related APIs and commands are now deprecated. In OpenShift Container Platform 3.7, the policy objects are completely removed and native RBAC is used instead. Any command trying to directly manipulate a policy object will fail. Roles and rolebindings endpoints are still available, and they proxy the operation to create native RBAC objects instead. The following commands do not work against a 3.7 server:

$ oadm overwrite-policy
$ oadm migrate authorization
$ oc create policybinding

A 3.7 client will display an error message when trying these command against a 3.7 server, but will still work against a previous server version, and old client will just fail hard against a 3.7 server.

Red Hat Enterprise Linux Atomic Host Version 7.4.2.1 or Newer Required for Containerized Installations

In OpenShift Container Platform 3.7, containerized installations require Red Hat Enterprise Linux Atomic Host version 7.4.2.1 or newer.

Labeling Clusters for Amazon Web Services

Starting with 3.7 versions of the installer, if you configured AWS provider credentials, you must also ensure that all instances are labeled. Then, set the openshift_clusterid variable to the cluster ID. See Labeling Clusters for Amazon Web Services (AWS) for more information.

Stricter Security Context Constraints (SCCs)

With the release of OpenShift Container Platform 3.7, there are now some stricter security context constraints (SCCs). The following capabilities are now removed:

  • nonroot drops KILL, MKNOD, SETUID, and SETGID.

  • hostaccess drops KILL, MKNOD, SETUID, and SETGID.

  • hostmount-anyuid drops MKNOD.

It is possible that the pods that previously were admitted by these SCCs, and were using such capabilities, will fail after upgrade. In these rare cases, the cluster administrator should create a custom SCC for such pods.

CloudForms Management Engine (CFME) Support Changes

OpenShift Container Platform 3.7 now fully supports Installation and Configuration of CFME 4.6 from the OpenShift Installer. As previously stated, CFME 4.6 is not currently released. The current CFME installer implementation in OpenShift Container Platform 3.7, however, is incompatible with the Technology Preview deployment process of CFME 4.5 as described in the OpenShift Container Platform 3.6 documentation.

The OpenShift Container Platform 3.7 documentation will be updated with more complete instructions on deploying CFME 4.6 after it has been released.

Node Authorizer and Admission Plug-in for Managing Node Permissions

In OpenShift Container Platform 3.7, the node authorizer and admission plug-in are used to manage and limit a node’s permissions. Therefore, nodes should be removed from the group that previously granted them broad permissions across the cluster:

$ oc adm policy remove-cluster-role-from-group system:node system:nodes

In OpenShift Container Platform 3.8, this step should be performed automatically via Ansible as a post-upgrade step.

The kube-service-catalog Namespace Is Global

The kube-service-catalog namespace is now made global by Ansible. Therefore, if you want multicast to work in vnid 0, you must set the netnamespace.network.openshift.io/multicast-enabled=true annotation on both namespaces (default and kube-service-catalog).

Migration to Kubernetes Role-based Access Control (RBAC)

Steps Taken During the 3.6 Release

A custom migration controller was created to automatically migrate OpenShift authorization policy resources to the equivalent RBAC resources:

  1. If an OpenShift authorization policy resource was created or modified or deleted, the action was automatically mirrored to the corresponding RBAC resource

  2. Changes directly applied to RBAC resources were, generally, automatically rolled back and forced to match the corresponding OpenShift authorization policy resource. If no corresponding resource existed, the RBAC resource would be deleted.

In essence, OpenShift authorization policy objects were the source of truth, and the RBAC objects were forced into matching these objects.

Release 3.6 Pre-upgrade Steps Before Upgrading to 3.7

There is a small set of configurations that are possible in OpenShift authorization policy resources that are not supported by RBAC. Such configurations require manual migration based on the use case. To guarantee that all Openshift authorization policy objects are in sync with RBAC, the oc adm migrate authorization command has been added. This read-only command emulates the migration controller logic, and reports if any resource is out of sync. It is run as a pre-upgrade step via an Ansible playbook and will cause the upgrade to fail if the objects are not in sync.

During a Rolling Upgrade from Release 3.6 to 3.7

The following scenario describes a rolling upgrade

  1. One master is upgraded and starts proxying OpenShift authorization policy resources and authorizing against RBAC objects.

  2. Old masters are still running the migration controller and one of them holds the controller leader election lock (either because it already had it or because it gained it by the first master being upgraded).

  3. The new master cannot modify any RBAC or proxied OpenShift authorization policy objects because the migration controller will undo all changes.

  4. Old masters can change OpenShift authorization policy resources and the migration controller will sync these to RBAC, making the changes visible to the new master.

  5. The new master does not have the migration controller.

  6. Controllers only speak to their local masters in OpenShift installed via Ansible, thus the migration controller is guaranteed to only communicate with the old masters.

  7. There is a small chance that a 3.7 controller process will become the leader once two masters have been upgraded (meaning no migrations of policy objects will occur after this point).

  8. Once all masters have been upgraded from 3.6 to 3.7, OpenShift authorization policy objects will be always proxied to RBAC objects.

  9. The migration controller will be gone and it will be possible to make changes to RBAC objects directly.

Considerations for Administrators During Rolling Upgrade

Avoid actions that require changes to OpenShift authorization policy resources such as the creation of new projects. If a project is created against a new master, the RBAC resources it creates will be deleted by the migration controller since they will be seen as out of sync from the OpenShift authorization policy resources. If a project is created against an old master and the migration controller is no longer present due to a 3.7 controller process being the leader, then its policy objects will not be synced and it will have no RBAC resources. After the 3.7 upgrade is complete, the following read-only script can be used to determine what namespaces lack RBAC role bindings (it is up to the cluster administrator to decide how to remediate these namespaces):

#!/bin/bash

set -o errexit
set -o nounset
set -o pipefail

for namespace in $(oc get namespace -o name); do
   ns=$(echo "${namespace}" | cut -d / -f 2)
   rolebindings_count=$(oc get rolebinding.rbac -o name -n "${ns}" | wc -l)
   if [[ "${rolebindings_count}" == "0" ]]; then
       echo "Namespace ${ns} has no role bindings which may require further investigation"
   else
       echo "Namespace ${ns}: ok"
   fi
done

RBAC and OpenShift Authorization Policy in Release 3.7

In 3.7, the RBAC objects become the source of truth. The OpenShift authorization policy objects no longer exist as real objects; the APIs are proxied to the RBAC resources. Therefore, creating, modifying, or deleting OpenShift authorization policy resources seamlessly results in actions against RBAC objects. The API master handles the conversion between these resources and legacy clients will continue to work as if nothing has changed. The RBAC objects also support watches, unlike the OpenShift authorization policy resources.

Policy-based resources have been removed in 3.7. However, RBAC role and binding objects are available and provide equivalent functionality.

Non-production Installations

The recommended way for installing non-production environments may change significantly in the next minor release of OpenShift Container Platform. Administrators should avoid tight coupling to the atomic-openshift-installer tool as part of the quick installer installation and upgrade processes.

Bug Fixes

This release fixes bugs for the following components:

Authentication

  • The secret for the private browser OAuth client was not correctly initialized. Therefore, the request token endpoint did not work. This bug fix correctly initializes the browser OAuth client on server start. The request endpoint can now be used to request tokens. (BZ#1491193)

  • The LDAP sync/prune command did not take into account the use of groupUIDNameMapping with a whitelist. The sync/prune command would fail with "group not found" errors because it would query for the wrong group name. With this bug fix, the command was updated to take groupUIDNameMapping into account when using a whitelist. Now, the command queries for the correct group name when groupUIDNameMapping and a whitelist are used together. (BZ#1484831)

  • RoleBinding objects can now be created without first creating a PolicyBinding object. (BZ#1477956

Builds

  • ImageStream output references and their corresponding secrets were resolved during build creation time. If the output imagestream did not exist yet, no push secret would be be computed, resulting in a build failure during push. With this bug fix, the ImageStream output and push secret will be computed when preparing to run the build, under logic which will retry until the imagestream is available. Builds that are started before the output imagestream exists will no longer fail during the push phase. (BZ#1443163)

  • Build, delete, and watch events, and the current Jenkins job being canceled were not handled when a build was canceled in OpenShift. Various negative, inconsistent Jenkins job results occurred along with many exception stack traces in the Jenkins system log. With this bug fix, Jenkins jobs are halted as soon as the build watch event detects that a build was deleted as the result of a build cancel action taken within OpenShift. There is now consistent, sensible behavior for the Jenkins users when builds are canceled or deleted. (BZ#1473329)

  • Source-to-image was not closing stdin/out/err pipes correctly in some error cases, causing a hang to occur. This was causing some OpenShift builds to hang in running status. (BZ#1442875)

  • The openshift jenkins sync plug-in was updating Jenkins pipeline build status annotations every second, regardless of whether the status changed. The frequency of updates would put unnecessary stress on the etcd instance backing openshift master. Now, Jenkins pipeline build status annotations are only updated if the status actually changes, or 30 seconds have passed. (BZ#1475867)

  • Permissions on directories injected as a build input via the image source input mechanism have user-only access permissions. The resulting application image cannot access the content when run as a random user ID. The directories will now be injected with group permissions, which allows the container user to access the directories. The directories will now be accessible at runtime as desired. (BZ#1480312)

  • When no tag is explicitly set, docker pulls all images. Builds would pull more images than necessary and take longer than needed. With this bug fix, a default tag will be set when the user does not supply a tag. Only a single image will be pulled for the build. (BZ#1498178)

  • The BitBucket build trigger webhook did not handle older versions of the webhook payload. Builds could not be triggered by older versions of the BitBucket server. This bug fix adds support for the older payload format. Builds can now be triggered by older versions of BitBucket. (BZ#1500731)

  • A regression bug was reported whereby source-to-image builds would fail if the source repository file system contained a broken symlink (pointing to a non-existent item). This is now resolved. (BZ#1506173)

Command Line Interface

  • The oc binary for macOS is not signed. Some of the customer’s company policies do not allow users to install unsigned binaries. This bug fix signs the oc binary using a Red Hat certificate. The oc binary is now trusted by companies that restrict the installation of unsigned binaries. (BZ#1436093)

  • The git clone command was being run without a timeout. Therefore, the oc new-app command was timing out. With this bug fix, oc new-app now uses git ls-remote with a timeout and the oc new-app command will not timeout. (BZ#1488283)

Containers

  • The POOL_META_SIZE configuration item is now added. The thin pool metadata size was set to .1% of free space of volume group. POOL_META_SIZE allows the operator to customize the size of thin pool metadata volume size to meet their workload. (BZ#1451769)

Deployments

  • Shortly after OpenShift starts, the caches might not yet be synchronised. Asa result, scaling the replication controllers might fail. Retry the scaling when there is a cache miss. With this bug fix, the replication controllers are scaled properly. (BZ#1427992)

Images

  • A .NET jenkins slave image for performing .NET CI/CD flows is now offered. This makes it easier to build and test .NET code bases using Jenkins. A .NET slave image is provided and configured out of the box in the Jenkins master image. (BZ#1451403)

  • Jenkins now installs all plug-ins via one RPM, and the missing plug-in is now included. (BZ#1481010)

  • importPolicy.insecure is ignored in oc import-image <imagestream:tag> As a result, re-import from an insecure registry fails because it expects a valid SSL certificate. When the image stream tag exists, use its importPolicy.insecure. With this bug fix, re-import succeeds. (BZ#1494231)

Image Registry

  • Images younger than the threshold are not added to the dependency graph. A blob that is used by a young image and by a prunable image is deleted because it has no references in the graph. Add young images to the graph and mark them as non-prunable. With this bug fix, the blob has references and is not deleted. (BZ#1487408)

  • The image pruning algorithm would consider only managed images for pruning. As a result, mirrored blobs for not managed images could not be pruned. External images could not be removed using pruning. With this bug fix, the pruning algorithm evaluates all the images, not just managed images. External images and their blobs can now be pruned. (BZ#1441028)

  • Previously, a bug in a regulator of concurrent file system access could cause a routine to hang. This caused many builds to hang during the registry push.This bug fix corrects the regulator. As a result, concurrent pushes no longer hang. (BZ#1436841)

  • Previously, the oadm prune images command would print confusing errors (such as operation timeout). This bug fix enables errors to be printed with hints. As a result, users are able to prune images, including images outside of the OpenShift cluster. (BZ#1469654)

  • The registry previously appended forwarded target ports to redirected location URLs. The client’s new request to the target location lacked credentials, and as a result, image push failed due to an authorization error. This bug fix rebased the registry to a newer version that fixes forwarding processing logic. As a result, clients can push images successfully to the exposed registry using arbitrary TLS-termination. (BZ#1471707)

  • Previously, imagestreamtags were not checked for dangling image references. This caused references to deleted images to be retained. This bug fix removes references to deleted images. As a result, deleting an image should allow references to the image to be deleted from imagestreamtags. (BZ#1386917)

  • Documentation and command help are now updated to include information on troubleshooting insecure connections to the secured registry. Error messages are now printed with hints, and new flags have been added to allow for insecure fall-back. As a result, users can now easily enforce both secure and insecure connections. (BZ#1448595)

Installer

  • Previously, the installation would fail when creating the Heketi secret because the key file did not copy on the first master host. This bug fix enables the installer to copy the SSH private key to the master node. (BZ#1477718)

  • The Ansible quick install would previously fail if the hostname was manually defined containing an uppercase letter. As a result, Kubernetes converted the names of the nodes to lowercase and did not recognize a node name with an uppercase letter. This bug fix ensures that hostnames for node objects are created with lowercase letters. (BZ#1396350)

  • When upgrading between versions (specifically 3.3/1.3 or earlier to 3.4 or later) the default values for clusterNetworkCIDR and hostSubnetLength changed. If the inventory file did not specify corresponding inventory variables, the upgrade would fail. This caused the controller service to not start back up. This bug fix requires that the inventory variables be set before upgrading or installing. As a result, if the required inventory variables are not set, the upgrade or installation will stop and tell the administrator to set the variables. (BZ#1451023)

  • Previously, the node service was not restarted when Open vSwitch was restarted, which could result in a misconfigured networking environment. This bug fix updates the services to ensure that the node service is restarted whenever Open vSwitch is restarted. (BZ#1453113)

  • Previously, Ansible facts added the svc domain to the NO_PROXY settings. As a result, users behind proxies were not able to push to registry by DNS. This bug fix adds the svc domain to the Ansible facts code. As a result, users behind a proxy can now push to registry by DNS. (BZ#1467776)

  • The flannel network was previously defined using the same subnet as the Kubernetes services subnet. This caused a conflict between services and SDN networks. The flannel network is now correctly defined by the osm_cluster_network_cidr variable. (BZ#1473858)

  • The necessary role for role binding in openshift_metrics was missing due to being processed out of order in the role. The role binding creation would fail and the role would fail to install. This bug fix updates the metrics to create the role immediately. As a result, role binding can be created during installation. (BZ#1476195)

  • The etcd scaleup playbook had an error where it attempted to run commands on hosts other than the host that was currently being scaled up resulting in an error if the other hosts did not yet have certain dependencies met. The playbooks now properly target only the host currently being scaled up. (BZ#1490739)

  • The stand-alone entry point for the openshift_storage_nfs task did not have the os_firewall role included. This resulted in the firewall not being properly installed and configured. The os_firewall has been added to the play. (BZ#1491657)

  • The etcd quota backend was set to 2GB by default. This resulted in a cluster going into a hold state, blocking all writes into the etcd storage. The default quota backend was increased to 4GB by default to encompass the storage needs of bigger clusters. (BZ#1492891)

  • When a company CA is added as a named certificates, the CA is added to ca-bundle.crt as well. This can cause client certificate popups when using IE,Safari or Chrome if the user has client certs configured via the browser. The code has been changed to not use the ca-bundle.crt and use the internal CA for client cert CA. (BZ#1493276)

  • As part of deprecating the use of openshift_hosted_{logging,metrics}_* variables, a default size for the storage volume wasn’t set for an NFS installation. As a result, the playbook would fail that the variable was not defined at runtime. The code was changed to use the default '10Gi' if not specified. The installer runs as expected. (BZ#1495203)

  • The disconnected installer did not have a way to specify a username/password to login to the docker repository to access downloaded images, requiring the user to disable authentication. The installation script now includes a mechanism for entering credentials. (BZ#1500642)

  • A new Docker option --signature-enabled that was introduced in a recent Docker release is set to False by default. The OpenShift Container Platform installation removes the parameter during the installation and Docker would get the default value of True. The Ansible scripts have been changed to include this option. (BZ#1502560)

  • Upgrading the logging component from 3.4.1 to 3.5.0 using Ansible failed with a No Elasticsearch pods found running error. The logging upgrade has been disabled as the EFK stack used for 3.4 and 3.5 is the same. The upgrade functionality is not necessary. (BZ#1435144)

  • When using ansible to configure the openID-connect provider for the OpenID and GitLab providers resulted in an error when setting challenge to true. This happens because of the validate function did not allowing this. The Ansible validate function was removed for OpenID and GitLab providers. The installation can complete successfully, and login succeeds. (BZ#1444367)

  • Docker 1.12.6-34 uses /etc/containers/registries.conf to define registries, but OpenShift Container Platform installer uses /etc/sysconfig/docker. As a result, system containers were reading registry information from the incorrect file. The code was changed to duplicate the registries in both locations to ensure additional/blocked/insecure registries are honored. (BZ#1460930)

  • A containerized installation with system containers enabled (use_system_containers=true) failed due to missing mounts. The code was updated so that the install performs as expected. (BZ#1463574)

  • The OpenShift Container Platform would correctly fail is the public host name was 64 characters or greater. However, the error message displayed did not report the source of the failure. The installer has been changed to report if the installation failed due to hostname length. (BZ#1467790)

  • When installing the service catalog, the template service broker (TSB) was not getting created. As a result, the TSB had to be created manually. The code has been changed so that the TSB is created automatically. (BZ#1470623)

  • Input for include_granted_scopes, which was expected to become a single quoted boolean string, was instead being interpreted and written to the file incorrectly. The resulting configuration file could have the wrong value for include_granted_scopes and removal of a code block attempted to interpret the input for include_granted_scopes. Input that is expected to land via include_granted_scopes now passes to the master-config.yml as expected. (BZ#1488505)

  • Because the Docker image availability health check does not support authenticated registries, checks failed when running against an authenticated registry. The code was changed to allow Docker to health check authenticated registries. (BZ#1488833)

  • Running the redeploy-router-certificates.yml playbook caused the router pod to fail (CrashLoopBackOff). The code was changed so that after running the redeploy-router-certificates.yml playbook, the router pod runs as expected. (BZ#1490186)

  • With Ansible 2.3, warnings are issued when using Jinja delimiters in 'when' conditions. The delimiters have been removed from the code base to avoid these warnings. (BZ#1490268)

  • Due to an earlier code change, the installation failed when giving a wildcard certificate to the installer. The code has been changed to properly copy a wildcard certificate during installation. (BZ#1492786)

  • Because of internal refactoring, the list of hostnames in the NO_PROXY file was empty. The facts have been restored The list of NO_PROXY names is correctly defined. (BZ#1495142)

  • When openshift_docker_use_system_container was set to false, the installer was incorrectly attempting to start the container engine, resulting in the installation failing. The installer code was changed and the installation proceeds as expected. (BZ#1496725)

  • The installer can now use an inventory specified as a directory rather than just a single file. This adds a parameter INVENTORY_DIR to the openshift-ansible image such that the user can indicate that ansible-playbook should use a mounted inventory directory. (BZ#1498908)

  • The logic for selecting the Enterprise registry was moved to a location that which was never read when installing system containers. Enterprise installs using system containers would fail as the openshift-ansible image could not be found in the Docker hub registry. Moved the enterprise registry logic into a high level playbook so that it is set for all runtime set ups. The enterprise images can be found and installation works. (BZ#1503860)

  • Due to recent simplification and refactoring there was a possibility of /etc/atomic.conf not being updated with proxy values before the first Atomic command was executed. Proxy use with the Atomic command did not work during the install. A new openshift_atomic role has been created for atomic specific tasks. The first task added is proxy which handles updating /etc/atomic.conf to ensure the proper proxy configuration is configured. This task file is then included (via include_role) in system container related task files. The atomic command always is able to use the properly defined proxy settings. (BZ#1503903)

  • An undefined variable was used in a task. The undefined variable caused a jinja template evaluation error which would crash the installation. The undefined variable has been removed and replaced with more informative error text. The playbook does not error out for external NFS storage class installations. (BZ#1504535)

  • The OpenShift Health Checker was not part of an Installer Phase and was not reported after playbook execution. The OpenShift Health Checker section of the primary installer path has been moved to its own section and an installer 'phase' has been added to report on installer status. (BZ#1504593)

  • When updating the openshift-ansible package, all subpackages are now updated in order to keep them in sync. (BZ#1506971)

  • The NetworkManager dispatcher script responsible for configuring a host to use dnsmasq operated in a non-atomic manner, resulting in failed DNS queries during boot up. The script has been refactored to ensure that required services are verified before /etc/resolv.conf is reconfigured. (BZ#1410288)

  • Using the Ansible installer to install metrics with dynamic storage failed. Installation now fails if the parameter storage kind = 'dynamic' is set without enabling dynamic provisioning. (BZ#1415297)

  • An error occurred from the yum module during the upgrade process. Yum transactions are now retried. (BZ#1479533)

  • The 'registry-console' image stream did not have a source tag specified, causing it to be improperly imported.The source tag has been added to the image stream ensuring that it imports properly. (BZ#1480442)

  • When enabling API aggregation with the ovs-multitenant SDN driver, creating a global project failed due to a performance latency issue. While creating a global project, the netnamespace is now checked to ensure availability and the Ansible Playbook Bundle finishes the operation. (BZ#1487959)

  • The device mapper kernel modules may not have been loaded on a host if overlay2 storage was used, which prevented the gluster storage system from working properly. With this fix, the installer now ensures that when gluster is used the dm_thin_pool, dm_snapshot, and dm_mirror modules are loaded. (BZ#1490905)

  • Previously, if there was no DNS search path in /etc/resolv.conf, then the NetworkManager dispatcher would omit adding cluster.local to the search path. With this bug fix, the dispatcher script was updated to ensure that a search path is created if one did not already exist. (BZ#1496593)

  • The example inventories have been updated to clearly indicate that the NFS export directory must only consist of lowercase alphanumeric characters, hyphens or periods, and must start and end with an alphanumeric character. (BZ#1488366)

  • The quick installer tool, atomic-openshift-installer, was initially blocked for use with OpenShift Container Platform 3.7 due to a bug. This has now been fixed in the latest update. (BZ#1509112)

Logging

  • Messages were read into Fluentd’s memory buffer and were lost if the pod was restarted because Fluentd considered them read, but they were not pushed to storage. This caused the loss of any message not stored, but already read by Fluentd. This fix replaced the memory buffer with a file based buffer. As a result, the file buffered messages are pushed to storage once Fluentd restarts. (BZ#1460749)

  • Kibana visualizations and dashboard for monitoring container and pod logs allows administrator users, cluster-admin or cluster-reader, to view logs by deployment, namespace, pod, and container. The script es_load_kibana_ui_objects is used to load dashboards and other Kibana UI objects for the given user. To use, run oc exec $espod — es_load_kibana_ui_objects user-name. It exists inside the Elasticsearch and ES-OPS pod, and must be run inside those pods. Additionally, it requires some indices and other objects set up by the OpenShift Elasticsearch plug-in, so the user must login to Kibana or Elasticsearch before using this script. This will also add an index pattern for project.* and load the necessary index pattern file. Kibana visualizations and dashboard gives administrators an easier way to view Kubernetes/OpenShift related logs in the cluster, allowing admin users have graphs and a dashboard to use to view logs from OpenShift pods and containers. (BZ#1467963)

  • The execute bit in the downstream repo was previously not set for run.sh. (BZ#1474715)

  • The value of the buffer_chunk_limit is now configurable, and defaults to 1M. To configure the buffer_chunk_limit, set the value to the environment variable BUFFER_SIZE_LIMIT or openshift_logging_fluentd_buffer_size_limit in the Ansible inventory file. To cover various types of input, buffer_chunk_limit needs to be configurable. The “size of the emitted data exceeds buffer_chunk_limit" can be fixed by configuring buffer_chunk_limit. (BZ#1413147)

  • Role permissions were generated based upon the project, causing queries to be disallowed if they involved multiple indices. This fix generates role permissions based on the user and not the project, allowing users to query across multiple indices. (BZ#1445425)

  • The openshift-elasticsearch-plugin was creating ACL roles based on the provided name, which could include slashes and commas. This caused the dependent lib to not properly evaluate roles. This fix hashes the name when creating ACL roles so they no longer contain the invalid characters. Now, users can use kibana and logging. (BZ#1456584)

  • The ansible parameter name is confusing to use and does not properly reflect how it is consumed by Fluentd. This fix removed the parameter, allowing Fluentd to consistently collect logs based on the source it detects. (BZ#1466152)

  • Elasticsearch was logging to console logs, resulting Elasticsearch ending up in a feedback loop ingesting its own logs. This fix turned off console logs in favor of file logs. As a result, the feedback loop is broken but users will need to setup Elasticsearch log volume with file rotation to get ES logs. Additionally, oc logs against an Elasticsearch pod will no longer be sufficient to retrieve Elasticsearch pod logs. (BZ#1432607)

  • Elasticsearch default value for sharing storage between Elasticsearch instances was wrong. This caused the incorrect default value to be allowed an Elasticsearch pod starting up (when another Elasticsearch pod was shutting down) to create a new location on the PV for managing the storage volume, duplicating data, and in some instances, potentially causing data loss. With this fix, all Elasticsearch pods now run with node.max_local_storage_nodes set to 1. As a result, the Elasticsearch pods starting up and shutting down will no longer share the same storage and prevent the data duplication and data loss. (BZ#1460564)

  • Use underscores when providing memory switches to the Nodejs runtime instead of dashes. As a result, the Nodejs interpreter understands the request. (BZ#1464020)

  • The openshift_logging_purge_logging Ansible variable was introduced to purge logging persistent data. Because the openshift_logging_install_logging=false will keep persistent data, there was a need for a complete uninstall. As a result, there are no changes to openshift_logging_install_logging, with the additional variable openshift_logging_purge_logging for complete uninstall. (BZ#1467265)

  • In the configuration for the Fluentd systemd input plug-in, the read_from_head parameter was not set properly based on the environment variable JOURNAL_READ_FROM_HEAD or its corresponding Ansible parameter openshift_logging_fluentd_journal_read_from_head. Due to the problem, the full contents of pre-existing logs were indexed instead of the latest logs captured by “tail” when a pos_file does not exist, which happens when the logging system is initially deployed or a pos_file is deleted. With this bug fix, the parameter is correctly set. And based on the setting, if JOURNAL_READ_FROM_HEAD=true, all the logs are indexed; if JOURNAL_READ_FROM_HEAD=false, logs read from "tail" are indexed when a pos_file does not exist. (BZ#1488941)

  • When deploying logging-fluentd with secure-forward to send the collected logs to logging-mux, it requires openshift_logging_mux_client_mode=maximal with openshift_logging_use_mux=True in the ansible inventory if the Fluentd container and the mux container are on the same node. If openshift_logging_mux_client_mode=maximal is set without openshift_logging_use_mux=True, the mux secret directory /etc/fluent/muxkeys is mounted in the Fluentd container although the secret directory does not exist. It makes Fluentd hang when it tries to access the mux secrets at the startup time. This patch checks the value of openshift_logging_mux_client_mode and openshift_logging_use_mux in the Ansible playbook and if the former is true while the latter is false, then it does not mount the mux secret directory in the Fluentd container. Also, if the Fluentd start script finds the mux secret directory does not exist, it disables openshift_logging_mux_client_mode even if it is enabled. (BZ#1490647)

  • The json-file parser was assuming the "time" field was a Time object instead of a String object, which does not have a "utc" method, causing the logs to fill with the error. This fix checks the type of object for the "time" field, and convert the String to a Time object if necessary. As a result, json-file read time values are parsed correctly with no errors. (BZ#1491405)

  • The openshift-elasticsearch-plugin was creating ACL roles based on the provided name which could include slashes and commas. This caused the dependent lib to not properly evaluate roles. This fix hashes the name when creating ACL roles so they no longer contain the invalid characters. As a result, users can use Kibana and logging. (BZ#1494239)

Web Console

  • Previously in the web console pod terminal, you could not enter third-level characters using the AltGr key such as ‘|’ (pipe) in some keyboard layouts. Now Alt+Gr-<character> combinations work properly in the web console pod terminal. (BZ#1292507)

  • In the web console, copying and pasting content from the terminal could result in extra spaces being added to the end of each line. Now when you copy content from the terminal, no extra spaces are added. (BZ#1395564)

  • The left navigation column did not support vertical scrolling. When the browser viewport was less than 440 pixels tall and wider than 768 pixels the bottom left navigation link was not accessible. The new left navigation column markup supports vertical scrolling. Now, all left navigation links are accessible at all browser viewport sizes and zoom levels. (BZ#1375134)

  • Previously, on iOS Safari, number inputs used the full keyboard rather than the number input. Now inputs that accept only numbers show the iOS number pad for easier entry. (BZ#1470976)

  • Previously, some requests for templates in the web console could timeout or take a long time to complete over high latency network connections. This could cause an error when loading the Add to Project page. The web console can now load templates using much less data, which fixes the problem. (BZ#1471033)

  • Clarifies help text on the Route creation and editing pages to make it clear that the CA certificates should be certificate chains. (BZ#1471155)

  • A known bug in Internet Explorer resulted in the layout of pod charts overflowing their containers on the overview page. As a result, the pod charts looked mis-aligned in the UI. The fix involved increasing the specificity on some CSS declarations so that they only apply when they are needed, which is during a deployment when the pod charts are being animated. As a result, the pod charts appear correctly aligned in Internet Explorer. (BZ#1473512)

  • A known bug in Internet Explorer resulted in the layout of catalog items taking up too much space. As a result, not all the catalog items were visible in Internet Explorer. The fix involved adding an additional CSS declaration as a workaround for IE. As a result, the catalog items now take up the correct space in IE. (BZ#1473615)

  • The code was using an empty envFrom entry when creating/editing the environment variable, causing a validation failure when adding or editing an environment variable using Deployment Configuration page of the web console. The user would receive an error that the deployment configuration is invalid. The envFrom entry is now properly submitted and the user can add or edit environment variables from the web console. (BZ#1502914)

  • Various errors were present in the source code that prevented Config maps were not available from the drop-down menu on the Edit Deployment Config page for pre and post-hooks when using Add Value from Config Map or Secret. These errors have been corrected. Config maps appear in the appropriate drop-downs. (BZ#1502914)

  • Previously, secrets with null values would display incorrectly when values were revealed on the secret details page. Now the web console will correctly display the secret key as having no value. (BZ#1510346)

  • Previously there was a quirk in the drag-and-drop behavior of the key value editor. While reordering an env var it might jump more than a single node at a time. This bug fix ensures that the drag-and-drop behavior will behave as expected. (BZ#1428991)

  • On the project overview, the Application drop-down menu was incorrectly set to overflow:hidden. As a result, when the application row is collapsed, the menu did not display fully. The overflow: hidden parameter has been removed and the menu is now fully visible. (BZ#1460153)

  • Previously, deleting a service account would ignore the SAs namespace. This means that the delete action from the web UI could delete multiple service account rolebindings under the service account tab if service accounts from different namespaces had the same name. The delete action on the SA tab will now respect the namespace and only delete the specified SA rolebinding from the correct namespace. (BZ#1507730)

  • The Configuration tab of the Deployment page in the web console was laid out in such a way that a large gap could appear when the right column contents were longer than the left column contents. The fix involved changing the layout markup so the gap does not appear. The result is there is no longer a gap between Volumes and Triggers when the right column content is longer than the left column content. (BZ#1505255)

Master

  • Ansible installs with a caBundle on the service catalog API service resulting in a 500 Internal Server Error on the product overview page in the web console. The installer was changed to install with insecureSkipTLSVerify flag set to true. As a result, the product overview page works as expected. (BZ#1473523)

  • CronJobs are placed in batch/v2alpha1 group, whereas other batch resources are placed in batch/v1. Due to this fact, some API machinery does not handle multiversioning problems properly. The restmapper, which is responsible for matching resource with appropriate api group version to handle multi-versioned apis, was updated. Describing resources works as expected. (BZ#1480453)

  • The installer was configured to watch specific resources that do not support watching. As a result, the /var/log/messages file was reporting errors and warnings related to the issue. The installer has been corrected to not watch these resources and the errors/warnings are not generated. (BZ#1452206)

  • Creating project using project template does not use the substituted project name, but the namespace name. As a result, the user is not able to use parametrized name as a project name as the generated suffix or prefix might be dropped. The code was changed to allow the use of substituted project name when creating the namespace. (BZ#1454535)

  • Node status information was getting rate limited during heavy traffic causing some nodes to fall into not ready status. The code was changed to use a separate connection for node healthiness. As a result, node status is reported without any problems. (BZ#1464653)

  • Running multiple clusters in a single authorization zone in AWS requires resources be tagged. If the clusters are not tagged, the clusters will not work properly. The master controllers process will require a ClusterID on resources in order to run. Existing resources will need to be tagged manually. Multiple clusters in one az will work properly once tagged. (BZ#1468579)

  • An upstream patch caused an error with the oc apply command. The patch deleted an element from an array (eg. env) and then reordered or modified another array (eg. volumeMounts). The kubectl apply fails with the _unable to find api field in struct Container for the json field "$setElementOrder/env". The algorithm was updated so that it continues operation under described condition. The oc apply works without any problems. (BZ#1497325)

Metrics

  • When either a certificate within the chain at serviceaccount/ca.crt or any of the certificates within the provided truststore file contain white space after the BEGIN CERTIFICATE declaration, the Java keytool rejects the certificate with an error, causing Origin Metrics to fail to start. As a workaround, Origin Metrics will now attempt to remove the spaces before feeding the certificate to the Keytool. Admins should ensure their certificates don’t contain such spaces. (BZ#1503450)

  • When deleting a large number of pods, the hawkular-metrics pod log reports Pool is busy errors. The condition was fixed upstream in Cassandra and clusters with a large number of pods should not report the Pool is busy error. (BZ#1451209)

  • When opening the metrics page in a disconnected environment, Hawkular attempted to connect to external web sites, such asfonts.googleapis.com. Because the cluster cannot connect to Internet, the metrics page loaded slowly. Changes were made upstream so that Hawkular does not attempt to connect to external web sites when there is no access to the Internet. As a result, in a disconnected environment, the metrics page loads properly. (BZ#1466403)

  • In Cassandra, it is possible that new generation objects (with the -Xmn flag) can exceed the maximum size of the Java memory heap (with the -Xmx flag). If that happens, the JVM will log a warning at start up, but Cassandra still starts. The code was changed to set the size of new generation objects at ¼ of the maximum heap size. (BZ#1471239)

  • Cassandra metrics would not start up if the commit log exceeded the limit applied to the log. An out-of-memory (OOM) condition would cause metrics to constantly start and stop. The commit log size is now based on total available memory. Also, log compression is no longer used, which will reduce the demand on resources. As a result, large logs should not affect metrics operation. (BZ#1473013)

Networking

  • When changes are made to software defined network (SDN) plugin, the master controller will fail to start when there are headless services in the cluster. As a result, when initializing OpenShift Container Platform, SDN fails to allow a nil service IP and OpenShift Container Platform was unable to start. The code was changed to allow nil as a valid value of srv.Spec.ClusterIP. OpenShift Container Platform SDN properly starts after changing network with headless service. (BZ#1451881)

  • The nodes local IP address is not part of the Open vSwitch (OVS) rules. If you deny 0.0.0.0/0 and allow a DNS name in the egress network policy, the node will not be able to reach that allowed address because DNS name resolution is blocked Adding the local node IP to the ovs allowed rule so that the name resolution will not be blocked. Also adding a note to the docs for the case when dns resolution does not happen on the node. OpenShift Container Platform can successfully block 0.0.0.0/0 as a cidrSelector and allow specific DNS names through. (BZ#1458849)

  • If the service network restart command is executed on a machine while the OpenShift Container Platform node process is running, a stop() function properly disables IP forwarding. However, the start() function was not re-enabling it. The code was changed to persist IP forwarding on nodes during network restarts. (BZ#1477716)

  • While upgrading nodes, if any invalid network CIDRs are detected, nodes might be unable to upgrade and will fail. The code was changed to not fail with invalid CIDRs. (BZ#1506017)

  • The Kubernetes CNI (Container Network Interface) plug-in generates errors if hostNetwork=true is configured for pods. This issue has been fixed. (BZ#1507257)

  • Because of upstream issues in Kubernetes, vSphere had networking problems when used with OpenShift Container Platform. The periodic resync of Kubernetes into OpenShift Container Platform included the required changes. vSphere now works correctly. (BZ#1433236)

  • Because of changes with upstream Kubernetes, the oadm join-projects, oadm isolate-projects and other commands that depend on the pod update operation will not work. The code was changed to fetch some required elements from the Container Runtime Interface (CRI) directly. As a result, the pod update operation works correctly and the commands work as expected. (BZ#1453190)

  • Because of default authorization, project administrators (standard user) were not able to manage network policies for their own projects. Changes to the code now allow project admins to create, delete, list the network polices in their own projects. (BZ#1461208)

  • An invalid HostSubnet could not be fixed. As a result, if a node with an invalid HostSubnet is restarted, the node assigned to the HostSubnet, would fail to start. The code has been changed to allow an invalid HostSubnet to be changed, using commands such as oc edit hostsubnet. (BZ#1466239)

  • Adding an IPv6 address to a host subnet as an egress resulted in a panic error. The code has been changed to better handle IPv6 addresses with a meaningful error message. (BZ#1500664)

  • Using ipfailover when a node fails ensures that a second node receives traffic. Previously, traffic went back to the first node once it is back up, potentially causing traffic imbalance. Now, using the --preemption-strategy="nopreempt" option, allows the administrator to control the default strategy, meaning that the strategy to switch to a higher priority node is suppressed. (BZ#1465987)

  • A log message similar to the following was repeatedly appearing:

    LoadBalancerRR: Removing endpoints for ops-health-monitoring/pull-07062050z-ie:8080-tcp

    This caused the logs to be filled with information not deemed important. The message has been hidden from the logs. (BZ#1468420)

  • Previously, the image for the default network diagnostics pod was mismatched, causing the diagnostics to fail. The image checking has been fixed, and the network diagnostics works without errors. (BZ#1481147)

  • Previously, conntrack entries for UDP traffic were not erased when an endpoint was added for a service that previously had no endpoints. This meant that the system could end up incorrectly caching a rule that would cause traffic to that service be dropped rather than being send to the new endpoint. The relevant conntrack entries have been changed to be deleted at the right time, meaning that the UDP services work correctly when endpoints are added and removed. (BZ#1487438)

Pod

  • Previously, network debug tests were showing errors regarding not being able to read stats from a changing pod. This was because, even though the container process had exited, but the cgroup wasn’t removed, leading to a Docker container with no tasks. The log spam has been reduced. (BZ#1328913)

  • Because of an outdated Go format, kubemarl-scale was consistently failing. The version of Golang was updated, stopping the failures. (BZ#1454239)

  • Previously, the HPA V1 was unable to get the metrics from the resource CPU. This was due to the custom setup of the HPA controller changing. The settings have been restored. (BZ#1458663)

  • Previously, multi-node environments produced “Failed to watch” errors. This was because the controller didn’t have permission to watch resources, which meant its behaviour was to retry every second by default. The controller has been given the permission to watch resources. (BZ#1465361)

  • Previously, the OpenShift master failed to start when using Openstack integration without Neutron LBaaS, which is not available in OpenShift. The issue now gives a warning instead of a failure, which mean the master will start successfully even if the LBaaS is not available. (BZ#1465722)

  • Previously, project volumes were not included in security context constraints, meaning that pods could not be used with projected volumes. The projected volumes have been added to the correct SCCs, and the projected volumes can be used as expected. (BZ#1448816)

  • Init containers with resource requests or limits were producing error messages. This was due to a mismatch in the sum of a pod’s container resources, resulting in the parent cgroup choosing the incorrect resource. The issue has been fixed upstream and the correct resources are being chosen. (BZ#1459826)

  • Previously, when a deployment configuration was created without any memory information when quota restrictions were in place, no error message would appear. The expected results were a “FailedCreate” event, much like with replication controllers. The “FailedCreate” event now appears when the pod immediately fails. (BZ#1465801)

  • A design limitation in previous versions does not account for memory-backed volumes against the pod’s cumulative memory limit. So, it is possible for a user to exhaust memory on the node by creating a large file in an memory-backed volume, regardless of the memory limit. Now, pod-level cgroups have been added to, among other things, enforce limits on memory-backed volumes, resulting in memory-backed volume sizes now being bound by cumulative pod memory limits. (BZ#1422049)

  • Previously, upgrading to 3.4 gave a “insufficient pods” error. This was due to a change in configuration from a max-pods variable to the smaller of 250 or 10 pods per core. The upgrade broke installations with fewer pods. The change has been made so that the max-pods variable has become the limiting variable. (BZ#1430484)

  • Previously, error messages in the status field of failed builds said “error” instead of an actual error message. This was because the status was showing the message from the Docker daemon returning the failed pod message. The message now returns a more helpful error message. (BZ#1449820)

  • Previously, registry pods were occasionally reporting liveness and readiness probe failures with the message http2: no cached connection was available. This was due to an upstream issue where the liveness and readiness probes get in the way of each other. The problem has been fixed upstream, and updated for OpenShift Container Platform version 3.7. (BZ#1454858)

  • Large clusters with a large amount of HPAs or unhealthy pods sent a large number of events if an object was unable to reach its desired state. This bug fix updates the event client to protect against spamming master components. As a result, this controls traffic to the masters and reduces writed to etcd. (BZ#1466933)

  • For all resources other than pod or PVCs, the quota controller would make a LIST call per namespace to determine current usage counts. This caused quota recalculation to take an extended period of time. This bug fix reduces LIST calls made by the resource quota controller by using shared informer caches. As a result, LIST operations made to the master were reduced and information was pulled from a shared cache in the controller. (BZ#1473370)

  • Previously, users were not able to to look up PVC information for the Drupal database without receiving scheduler log spam. This bug fix prevents unnecessary logging of a harmless error from a PVC-related scheduler predicate. (BZ#1475558)

  • Previously, messages originating from the AWS SDK were causing partial log entries due to new lines in the message itself. Error messages are now properly quoted so all messages are (BZ#1462445)

Routing

  • Previously, the help information included a redundant example. This bug fix removed the redundant example. As a result, the help information is now more concise. (BZ#1440620)

  • Previously, the code path automatically prepended the partition name to the vserver name. If the vserver was in a path of length more than 1, then the path was lost because only the partition name was prepended. This bug fix prepends the entire path of vserver instead of just concatenating the partition name and vserver name. (BZ#1465304)

  • Previously, if you had a router of a previous version of OpenShift Container Platform a 403 http status resulted when the router stats were accessed without credentials. This web browser did not prompt the user for a password so the stats were inaccessible. The code has been updated to return a 403 when no credentials are passed and the browser now prompts the user for a password, so the router stats are visible in a web browser. (BZ#1467257)

  • Previously, the IP failover keepalived image did not support IPV6 addresses or ranges, as well as IP address validation. Adding IPV6 addresses to the oadm ipfailover command resulted in a new vrrp section pertaining to the wrong address. The code has been updated, and inputting invalid IPV4 and IPV6 addresses now return an error as expected. (BZ#1459960)

  • Previously, the x-forwarded header and its associated information, displayed the IPV6 form in IPV4 form. The ROUTER_IP_V4_V6_MODE environment variable has been created to control which form is displayed. (BZ#1471255)

  • Previously, the locking was overly broad, causing events to not be processed while an HAProxy reload was happening. This meant that route changes would take hours to process. The locking has been made more fine-grained, so that events can be processed in parallel. And changes are now processed within the time of two reloads of the router. (BZ#1471899)

  • An error in the router code caused by a missing locking around a router data structure was causing errors causing the router pod to occasionally crash and restart. The locking has been fixed, and the router now works as expected. (BZ#1473031)

  • When running the oc adm router --expose-metrics command, the router deployment failed because the generated deployment configuration object was not compatible. This was due to a background change upstream. A change has been made with the oc adm router command, and the command can now handle --expose-metrics. (BZ#1488954)

  • Previously, multiple service catalog objects named “default” were not a problem, but a change made them all top level. This bug fixes the object names to be unique. (BZ#1420543)

Service Broker

  • Previously, a fresh installation using the openshift-ansible method and with a service-catalog resulted in the service class being empty, resulting in the stage registry giving a bad response. The administrator would need to see the ASB logs and trigger a manual bootstrap. Now, if the bootstrap fails, the broker fails, and the kubelet retries the process until it works correctly. (BZ#1468173)

  • This bug fixes running the service-catalog binaries for the apiserver and controller manager when used with the --version option, which previously reported UNKNOWN, but now reports the correct value. (BZ#1476134, BZ#1475251)

  • Previously, when deleting a namespace, the Ansible Service Broker (ASB) attempted to execute deprovision playbook actions using a namespace in a "terminating" state. This led to the APB actions being rejected, because of the namespace terminating. As a result, deprovision fails, and both the APB deprovision sandbox and target namespace were not deleted. Now, instead of executing APB actions on namespace deletion, the records of the services to be deprovisioned are cleaned up, allowing kubernetes to delete the resources normally, meaning the target namespace is properly deleted by Kubernetes. (BZ#1476173)

  • The error message returned when a user does not have permission to modify a TemplateInstance is updated. (BZ#1460145)

  • Previously, only one annotation returned when both expose and base64-expose annotations were defined in template (per bind request). This issue is fixed in the latest release. (BZ#1463570)

  • Previously, Ansible Playbook Bundles (APB) that have been removed from their container catalog, appeared in Ansible Service Broker (ASB) as valid options even after bootstrap was performed. This issue is fixed now. (BZ#1463798)

  • Previously, there were inconsistency between the serviceclass and the server-broker. After creating a broker, the controller-manager only fetched the catalog once. This resulted in inability to updates the serviceclass unless the broker was recreated. This is fixed now. (BZ#1469448)

  • Previously, the Ansible service broker would fail on provisioning because of incorrect permissions. This is now fixed and Ansible service broker now has the required permissions for creating new namespaces and dynamic service account in these new namespace to run APBs. (BZ#1469485)

  • The oc version command did not get OpenShift version against the ansible deployed service catalog environment. The version information is added the command now reports correct information. (BZ#1471717)

  • Previously, when the Ansible Service Broker started it could not communicate to the configured registry, and therefore got no information about APBs. This was because of a missing setting in the ansible service broker configuration. The broker: bootstrap_on_startup: true setting is now added in the configuration which resolves this issue. (BZ#1471973)

  • Previously, the ansible service broker container would fail if the dockerhub credentials were not supplied because the encryption script required them. It is now reconfigured to use RHCC adapter and the dockerhub credentials are optional. (BZ#1464222)

  • Previously, bad data was being returned from the bootstrapped registry. This was because the broker failed to bootstrap and it used to error out due to a null pointer de-reference. The broker now has logic to avoid de-referencing null pointers if the data is corrupted. This issue is now resolved and the broker skips image with bad data and continues with next one. (BZ#1467905)

  • The Service Broker Installer was setting incorrect configuration values for launchapbonbind, this is fixed and configuration value is now set as launch_apb_on_bind. (BZ#1467948)

  • The role for Service Accounts used by the Ansible Service Broker is updated. The Broker runs under asb service account set to admin through a ClusterRoleBinding and APBs run under a temporary service account granted edit through a RoleBinding in the target namespace. (BZ#1470824)

Storage

  • Creating a new persistent volume claim (PVC) using OpenStack Cinder storageclass resulted in the PVC being stuck in Pending state. This bug fix re-configured the cloud provider openstack.conf to use OpenStack Keystone V3. As a result, dynamic provisioning of new Cinder volumes works as documented. (BZ#1491331)

  • Previously, the Gophercloud library used by OpenShift to communicate with the OpenStack API did not accept HTTP status 300 in pagination. It was not possible to dynamically provision OpenStack Cinder volumes. This bug fix upgrades the Gophercloud library in the OpenShift vendor directory. As a result, dynamic provisioning of new Cinder volumes works as documented. (BZ#1490768)

  • Previously, the default bootstrap policy allowed basic users to “get” storage classes, but not “list” storage classes. Basic users would receive an error message after issuing the oc get storagelcass storageclass_name command. This bug fix modified the bootstrap policy. As a result, basic users can now issue the oc get storagelcass storageclass_name command to receive specific storage classes. (BZ#1449608)

  • Previously, the lack of cloud provider configuration in the admission plug-in caused persistent volume (PV) creation to fail when attempting to create the PV in a zone other than master. This bug fix enables static PV provisioning in multizone environments. As a result, users can now statically provision PVs in zones other than master. (BZ#1454601)

  • Previously, when creating storage classes, users could not specify the fstype. This bug fix allows specifying the desired fstype when dynamically provisioning volumes with storage classes. As a result, storage classes now support file system configuration when creating dynamically provisioned volumes. (BZ#1469001)

  • Previously, it was not possible to dynamically provision ScaleIO volumes if the ScaleIO volume plug-in was not enabled. This bug fix enables the ScaleIO volume plug-in in OpenShift Container Platform 3.7. As a result, it is now possible to dynamically provision ScaleIO volumes. (BZ#1482274)

  • When trying to mount/unmount, the FlexVolume plug-in’s file system previously assumed that SELinux was supported. This assumption instructed docker to relabel the volume. If the FlexVolume plugin’s file system did not support file system relabeling, the container using the FlexVolume would fail to start. This bug fix added the selinuxRelabel capability, which allows FlexVolume plug-ins to report in their init call. As a result, FlexVolume plug-ins can now be configured to opt out of SELinux relabeling. (BZ#1484899)

Templates

  • Previously, the service catalog could not provide authentication when invoking the template service broker, which meant the template service broker API had to allow calls from unauthenticated clients. This bug fix allows the service catalog to use proper authentication to invoke the template service broker when issuing the oc cluster up command to run both. As a result, the template service broker APIs will now be secured, and will only be invokable by the service catalog (or another client with appropriate credentials). (BZ#1470628)

Upgrade

  • Previously, the master node upgrade took more disk space than was initially estimated. This caused the etcd member to report a no space left on device error message. This bug fix increased the estimation of disk space needed before the master node upgrade can start. As a result, a master node is properly upgraded with enough disk space left after the upgrade finishes. (BZ#1489182)

  • Previously, the upgrade playbooks incorrectly overwrote nondefault admissionConfig parameters while setting specific values required of the upgrade process. This bug fix removed this task as it is no longer necessary after upgrading from OpenShift Container Platform 3.4 to OpenShift Container Platform 3.5. (BZ#1486054)

  • Previously, the etcd v3 data migrated prior to the first etcd v2 snapshot being written. Without a v2 snapshot, the v3 data was not propagated properly to the remaining etcd members, which resulted in a loss of some v3 data. This bug fix checks to see if there is at least one v2 snapshot before etcd data migration proceeds. As a result, etcd v3 data is now properly distributed among all etcd members. (BZ#1501752)

  • When trying to upgrade OpenShift Container Platform with dedicated etcd from v3.6 to v3.7, the upgrade failed at the [Stop atomic-openshift-master-controllers] task due to the wrong hosts group. This bug fix corrected the host group to specify the masters group for controller restart. As a result, the upgrade now succeeds. (BZ#1504515)

  • Previously, if Ansible tags were used to evaluate some of the tasks in a set of playbooks, the conditional for including a task file was not properly evaluated. This caused the upgrade to fail. This bug fix allows the conditional to evaluate properly and skip running the task. (BZ#1464025)

  • Ansible playbooks now exit immediately when health checks fail. Previously, in some instances, a host failure would not result in the playbook exiting during failed health checks. This bug fix sets the any_errors_fatal play option to true, ensuring that the playbook exits as expected. (BZ#1484324)

  • Upgrades that made use of system reboots to restart services may have failed if hosts took longer than 5 minutes to restart. This bug fix increases the timeout to 10 minutes. As a result, the shutdown process is now faster. (BZ#1455836)

Technology Preview Features

Some features in this release are currently in Technology Preview. These experimental features are not intended for production use. Please note the following scope of support on the Red Hat Customer Portal for these features:

The following new features are now available in Technology Preview:

The following features that were formerly in Technology Preview from a previous OpenShift Container Platform release are now fully supported:

The following features that were formerly in Technology Preview from a previous OpenShift Container Platform release remain in Technology Preview:

Known Issues

  • The installer can not deploy system container-based installations when the specified registry requires authentication credentials in order to pull the required system container images. The fix for this depends on an update to the atomic command, which will be updated after OpenShift Container Platform 3.7 GA. (BZ#1505744)

  • A OpenShift Container Platform 3.7 master will return an unstructured response instead of structured JSON when an action is forbidden. This is a known issue and will be fixed in OpenShift Container Platform 3.8.

  • The volume snapshot Technology Preview feature may not be available to non-administrator users by default due to API RBAC settings. When the volume snapshot controller and provisioner are installed and run, the cluster administrator needs to configure the API access to the VolumeSnapshot objects by creating roles and cluster roles, then assigning them to the desired users or user groups. (BZ#1502945)

  • OpenShift Container Platform is unable to list known health checks. (BZ#1509157)

  • The current format of audit logs is difficult to consume. Some keys are duplicates and some are misleading in that they match wrong keys in the linux-audit dictionary. (BZ#1496176)

Asynchronous Errata Updates

Security, bug fix, and enhancement updates for OpenShift Container Platform 3.7 are released as asynchronous errata through the Red Hat Network. All OpenShift Container Platform 3.7 errata is available on the Red Hat Customer Portal. See the OpenShift Container Platform Life Cycle for more information about asynchronous errata.

Red Hat Customer Portal users can enable errata notifications in the account settings for Red Hat Subscription Management (RHSM). When errata notifications are enabled, users are notified via email whenever new errata relevant to their registered systems are released.

Red Hat Customer Portal user accounts must have systems registered and consuming OpenShift Container Platform entitlements for OpenShift Container Platform errata notification emails to generate.

This section will continue to be updated over time to provide notes on enhancements and bug fixes for future asynchronous errata releases of OpenShift Container Platform 3.7. Versioned asynchronous releases, for example with the form OpenShift Container Platform 3.7.z, will be detailed in subsections. In addition, releases in which the errata text cannot fit in the space provided by the advisory will be detailed in subsections that follow.

For any OpenShift Container Platform release, always review the instructions on upgrading your cluster properly.